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Abstract. Although there are some real world applications where the 
use of variable length representation (VLR) in Evolutionary Algorithm 
is natural and suitable, an academic framework is lacking for such repre- 
sentations. In this work we propose a family of tunable fitness landscapes 
based on VLR of genotypes. The fitness landscapes we propose possess 
a tunable degree of both neutrality and epistasis; they are inspired, on 
the one hand by the Royal Road fitness landscapes, and the other hand 
by the NK fitness landscapes. So these landscapes ofi^er a scale of con- 
tinuity from Royal Road functions, with neutrality and no epistasis, to 
landscapes with a large amount of epistasis and no redundancy. To gain 
insight into these fitness landscapes, we first use standard tools such as 
adaptive walks and correlation length. Second, we evaluate the perfor- 
mances of evolutionary algorithms on these landscapes for various val- 
ues of the neutral and the epistatic parameters; the results allow us to 
correlate the performances with the expected degrees of neutrahty and 
epistasis. 



1 Introduction 

Individuals in Genetic Algorithms (GA) are generally represented with strings 
of fixed length and each position of the string corresponds to one gene. So, the 
number of genes is fixed and each of them can take a fixed number of values 
(often and 1). In variable length representation (VLR), like Messy GA or 
Genetic Programming, genotypes have a variable number of genes. Here, we 
consider VLR where a genotype is a sequence of symbols drawn from a finite 
alphabet and a gene is a given sub-sequence of such symbols. The main difference 
with fixed length representation is that a gene is identified by its form and not 
by its absolute position in the genotype. 

Some specific obstacles come with the variable length paradigm. One of the 
most important is the identification of genes. Indeed, during recombination, 
genes are supposed to be exchanged with others that represent similar features. 
So the question of the design of suitable crossover operators becomes essential 
(see for example [T]). Another difficulty due to variable length is the tremendous 
amount of neutrality of the search space, as noted in [2]. Neutrality appears 
at different levels. First, a gene may be located at different positions in the 
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genotype. Second, some parts of genotype (called introns) do not perform any 
functions and so do not contribute to fitness. The last specificity is that variable 
length strings introduce a new dimension in the search space, which have to be 
carefully explored during evolution to find regions where fitter individuals pre- 
vail. The exploration of sizes seems to be difficult to handle and may lead, as in 
Genetic Programming, to an uncontrolled growth of individuals (a phenomenon 
called bloat 3 ). 

One of the major concerns in the GA field is to characterize the difficulty 
of problems. One way to achieve this is to design problems with parameters 
controlling the main features of the search space; to run the algorithm; and to 
exhibit how performances vary according to the parameters. With fixed length 
representations, some well known families exist, as the Royal Road functions, 
where inherent neutrality is controlled by the block size, or the NK-landscapes, 
where the tunable parameter K controls the ruggedness of the search space. With 
VLR, there are only a few attempts to design such academic frameworks [J - Note, 
for example, the Royal Tree ^Sj and the Royal Road for Linear GP [1]. 



2 Royal Road for variable length representation 

In GA, Royal Road landscapes (RR) were originally designed to describe how 
building blocks are combined to produce fitter and fitter solutions and to inves- 
tigate how the schemata evolution actually takes place [6] . Little work is related 
to RR in variable length EA; e.g. the Royal Tree Problem [5] which is an at- 
tempt to develop a benchmark for Tree-based Genetic Programming and which 
has been used in Clergue et al. [7] to study problem difficulty. To the best of our 
knowledge, there was no such work with linear structures. 

In a previous work, we have proposed a new kind of fitness landscape [T], 
called Royal Road landscapes for variable length EA (VLR Royal Road). Our 
aim was to study the behavior of a crossover operator during evolution. To 
achieve this goal, we needed experiments able to highlight the destructive (or 
constructive) effects of crossover on building blocks. 

To define VLR Royal Road, we have chosen a family of optimal genotypes 
and have broken them into a set of small building blocks. Formally, the set of 
optima is: 



with 

Bb{9,l) 



lif 3i G [0,A-6] I ,Vj e [0,5-1], =Z, 
otherwise, 



and 



& > 1 the size of blocks 

E an alphabet of size N that defines the set of all possible letters I per locus 
G s the finite set of all genotypes of size A < A,„Qa|j defined over S 



\max have to be greater than Nh 
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— g & genotype of size A < \max 

— gk the fc*'' locus of g. 

The following genotype g S Gs is an example of optimum, with S — {A, T, G, C} 
and 6 = 3: 

g = AAAGTAGGGTAATTTCCCTGCG . 

Bi,{g, I) acts as a predicate accounting for the presence (or the absence) of a 
contiguous sequence of a single letter (i.e. a block). Note that only the presence of 
a block is taken into account, neither its position nor its repetition. The number 
of blocks corresponds to the number of letters I G S for which Bi,{g, I) is equal 
to one. In the previous example, only boldfaced sequences contribute to fitnes^. 
The contribution of each block is fixed and so, the fitness fNb{g) of genotype 
g £ Gs having n blocks is simply: 

1 ^ n 

fmig) = —J2^b{g,h) = ^ 

i=l 

To efficiently reach an optimum, the EA system has to create and combine 
blocks without breaking existing structures. These landscapes were designed in 
such a way that fitness degradation due to crossover may occur only when recom- 
bination sites are chosen inside blocks, and never in case of blocks translocations 
or concatenations. In other words, there is no inter blocks epistasis. 

3 NK-Landscapes 

Kauffman [5] designed a family of problems, the NK-landscapes, to explore how 
epistasis is linked to the 'ruggedness' of search spaces. Here, epistasis corresponds 
to the degree of interaction between genes, and ruggedness is related to local 
optima, their number and especially their density. In NK-landscapes, epistasis 
can be tuned by a single parameter. Hereafter, we give a more formal definition 
of NK-landscapes followed by a summary review of their properties. 

3.1 Definition 

The fitness function of a NK-landscape is a function Jnk '■ {0, 1}^ — > [0, 1) 
defined on binary strings with A'' loci. Each locus i represents a gene with two 
possible alleles, or 1. An 'atom' with fixed epistasis level is represented by a 
fitness components fi : {0, 1}^^+^ [0, 1) associated to each locus i. It depends 
on the allele at locus i and also on the alleles at K other epistatic loci {K must 
fall between and A^ — 1). The fitness fNK{x) of x G {0, 1}^ is the average of 
the values of the A^ fitness components ff. 



* Although the last sequence of 'CCC is a valid block, it does not contribute to fitness 
since it is only another occurrence. 
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fNK{x) — — fi{xi; Xi-^, . . . , Xij^) 

i=l 

where {ji, . . . , Ik} C {1, . . . , i — 1, z + 1, . . . , N}. Many ways have been proposed 
to choose the K other loci from N loci in the genotype. Two possibilities are 
mainly used: adjacent and random neighborhoods. With an adjacent neighbor- 
hood, the K genes nearest to the locus i are chosen (the genotype is taken 
to have periodic boundaries). With a random neighborhood, the K genes are 
chosen randomly on the genotype. Each fitness component fi is specified by ex- 
tension, ie a number yi^(xi;xi-^,...,xi^) from [0, 1) is associated with each element 
{xi'jXij^, . . . ,Xij^) from {0,1}^+^. Those numbers are uniformly distributed in 
the interval [0, 1). 



3.2 Properties 

The NK-landscapes have been used to study links between epistasis and local 
optima. The definition of local optimum is relative to a distance metric or to a 
neighborhood choice. Here we consider that two strings of length N are neighbors 
if their Hamming distance is exactly one. A string is a local optimum if it is fitter 
than its neighbors. 

The properties of NK-landscapes are given hereafter in term of local optima: their 
distribution of fitness, their number and their mutual distance. These results can 
be found in Kauffman[5|, Weinberger [S], Fontana et aL[10|. 

— For K — the fitness function becomes the classical additive multi-locus 
model, for which 

• There is single and attractive global optimum. 

• There always exists a fitter neighbor (except for global optimum). 

• Therefore the global optimum could be reach on average in N/2 adaptive 
steps. 

— For K ~ N — I, the fitness function is equivalent to a random assignment of 
fitnesses over the genotypic space, and so: 

• The probability that a genotype is a local optimum is jji^ ■ 

• The expected number of local optima is 

• The average distance between local optima is approximately 2ln{N — 1) 

— For K small, the highest local optima share many of their alleles in common. 

— For K large: 

• The fitnesses of local optima are distributed with an asymptotically nor- 
mal distribution with mean m and variance s approximately: 



2ln{K + l) _ {K+l)cr^ 

K+1 ' * ^ N{K+l+2{K+2)ln{K+l)) 



where /i is the expected value of fi and its variance. In the case of 
the uniform distribution, fi = 1/2 and a = a/1/12. 
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• The average distance between local optima is approximately — 2{k+ 

• The autocorrelation function p{s) and the correlation length t are: 




4 Epistatic Road for variable length representation 

In this section, we define a problem with tunable difficulty for variable length EA, 
called Epistatic Road functions (ER). To do so, we propose to use the relation 
between epistasis and difficulty. 

4.1 Definition 

Individuals in a variable length representation may be viewed as sets of inter- 
acting genes. So, in order to model such a variable length search space, we have 
to first identify genes and second explicitly define their relations. This can be 
easily done by extending the VLR Royal Road thanks to dependencies between 
the fitness contributions of blocks. Thus, genes are designated as blocks and the 
contribution of a gene depends on the presence of others, exactly as in NK- 
landscapes. 

More formally, the fitness function of an ER-landscape is a function jNKb '■ 
Gs — > [0, 1) defined on variable length genotypes. The fitness components fi are 
defined in 13.11 and the fitness fNKb{g) of genotype g £ Gs is the average of N 
fitness components ff. 



In practice, we use an implementation of NK-landscape with random neighbor- 
hood to compute ft. We have to ensure that the set of all genotypes having N 
blocks corresponds to the end of the Road. For that purpose, first we exhaus- 
tively explore the space {0, 1}^ to find the optimum value of the NK, then we 
permute this space in such a way that the optimum becomes 1^. 

4.2 Tunability 

The properties of an ER-landscape depends on the three parameters N, K and 
h. Although these parameters are not entirely independent, each allows us to 
control a particular aspect of the landscape. Increasing the parameter N causes 
the size of both the search space and the neighborhood of genotype to increase. 
Moreover, as N determines the number of genes to find, the computational effort 
required to reach the optimum will be more important when high values of N 
are used. The parameter h controls the degree of neutrality. As h increases the 




i=l 
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size of iso-fitness sets increases. Finally, the parameter K allows to control the 
number of epistatic links between genes and so the number of local optima. For 
K = an ER-landscape will be very closed to the corresponding VLR Royal 
Road since insertion of a new block in a genotype always increases the fitness. 
In contrast, for K = N — 1, with a high level of epistasis, the vast majority of 
the roads leads to local optima where the insertion of a new block in a genotype 
always decreases the fitness. 

5 Fitness landscape analysis 

Many measures have been developed to describe fitness landscapes in terms of 
"difficulty" . Here, "difficulty" refers to the ability of a local heuristic to reach the 
optimum. In this section some of those metrics are applied to the ER-landscapes. 
In particular, we show how difficulty changes according to the three parameters 
N , K and b. The neighborhood of variable length genotypes is different from 
the neighborhood of fixed length genotypes. To define a neighborhood in ER- 
landscapes, we use String Edit Distance, like Levenshtein distance which 
has been already used in GP to compute or control diversity |]J , or to study 
the influence of genetic operators [T^ . By definition, the Edit Distance between 
two genotypes corresponds to the minimal number of elementary operations 
(deletion, insertion and substitution) required to change one genotype into the 
other. So two strings in the search space are neighbors if the Edit Distance 
between them is equal to f . Thus a string of length A has (2A + I)A^ neighbors. 

In order to minimize the influence of the random creation of an NK-landscape, 
we take the average of the following measures over 10 different landscapes for 
each couple of parameters N and K . We have perform experiments for N = 8, 
10 and 16, for K between and iV — 1 and for b between 1 and 5. 

5.1 Random walks, autocorrelation function and correlation length 

Weinberger [91 14] defined autocorrelation function and correlation length of ran- 
dom walks to measure the epistasis of fitness landscapes. 
A random walk {gt,gt+i, . . .} is a series where gt is the initial genotype and 
is a randomly selected neighbor of gi. Then the autocorrelation function p of a fit- 
ness function / is the autocorrelation function of the time series {f{gt), figt+i), ■ ■ ■} ■ 

, . {f{9t).f{9t+s))t ~ {f? 
pis) = 77^ 

The correlation length r measures how the correlation function decreases and 
so how rugged the landscape is. More rugged the landscape the shorter the 
correlation length. 

1 

Empirical measures on ER landscapes were performed on 20.10'^ random 
walks of length 35 for each triplet of parameters N, K, b and for each of 10 
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instances of NK-landscapes. The initial genotypes were generated by randomly 
choosing its length between and Xmax and then randomly choosing each let- 
ter of the genotype. For those random walks, Xmax is equal to 2Nb. For small 





Fig. 3. Autocorrelation fmiction of Fig. 4. Theoretical autocorrelation 
ER-landscape for = 10 function of NK-landscape for N = 10 



values of 6, the correlation length decreases quickly (when the parameter K 
increases, see fig. [1] and [3]). As expected, the correlation of fitness between geno- 
types decreases with the modality due to the parameter K. We can compare this 
variation with the theoretical correlation length of NK-landscapes, given in 13.21 



450008 M. Defoin Platel et al. 



(see fig. [2]and[4]). As b increases, the influence of K on the correlation length 
decreases. Neutrality keeps a high level of correlation in spite of the increase in 
modality. 

5.2 Adaptive walks and local optima 

Several variants of adaptive walk (often called myopic or greedy adaptive walk) 
exists. Here we use the series {gt, gt+i, ■ ■ ■ , 9t+i} where gt is the initial genotype 
and gi^i is one of the fittest neighbor of gi. The walk stops on gt+i which is a 
local optimum. By computing several adaptive walks, we can estimate: 

— The fitness distribution of local optima by the distribution of the final fit- 
nesses f{gt+i)- 

— The distance between local optima which is approximately twice the mean 
of the length / of those adaptive walks. 

Empirical measurements on ER landscapes were performed on 2.10"^ random 
walks for each triplet of parameters N , K, b and for each of 10 instances of NK- 
landscapes. We used the same initialization procedure as the random walk. The 
parameter Xmax is set to 50. The distribution of local optima fitnesses is close to 
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Fig. 5. Mean fitness of local op- Fig- 6. Mean length of adaptive 
tima of ER-landscapes obtained with walks on ER-landscape for = 10 
adaptive walks for A^ = 10 



normal distribution. The mean fitness of local optima is represented for N — 10 
on Figure O it decreases with b. The variations of the fitness of local optima are 
great for small values of K but become almost insignificant for medium and high 
values of AT. In Figure [51 the mean length of adaptive walks is represented for 
A^ = 10. As expected, it decreases with K for small values of b. So, the parameter 
K increases the ruggedness of the ER-landscape. On the other hand, when b is 



From Royal Road To Epistatic Road 450009 



higher, K has less influence on the length of the walk. Indeed, the adaptive walk 
breaks off more often on neutral plateaux. 

5.3 Neutrality 

A random walk is used to measure the neutrality of ER-landscapes. At each 
step, the number of neighbors with lower, equal and higher fitness is counted. 
We perform 2.10'^ random walks of length 20 for each triplet of parameter N, 
K and h. The Table [T] gives the proportions of such neighbors for N=8, K=A 
(they depend slightly on N and K) and for several values of b. The number of 
equally fit neighbors is always high and is maximum for h=A. So, neutral moves 
are a very important feature of ER-landscapes. 

Table 1. Proportion of Lower, Equal and Higher neighbor 



Block 


size 


N -- 


= 8, 


= 4 


Lower 


Equal 


Higher 


b = 


2 


7.2 


85.8 


7.0 


b = 


3 


2.8 


94.4 


2.8 


b = 


4 


0.5 


98.9 


0.6 



6 EA performances 

In this section, we want to compare the performances of an evolutionary sys- 
tem on ER-landscapes for various settings of the three parameters N , K and 
h. The performances are measured by the success rate and the mean number 
of blocks found. In order to minimize the infiuence of the random creation of 
NK-landscapes, we take the average of these two measures over 10 different land- 
scapes. 35 independent runs are performed with mutation and crossover rates 
of respectively 0.9 and 0.3 (as found in [T]). The standard one point crossover, 
which blindly swaps sub-sequences of parents, was used. Let us notice that a 
mutation rate of 0.9 means that each program involved in reproduction has 
a 0.9 probability to undergo one insertion, one deletion and one substitution. 
Populations of 1000 individuals were randomly created according to a maxi- 
mum creation size of 50. The evolution, with elitism, maximum program size 
of lGQ{Xmax)i 4-tournament selection, and steady-state replacement, took place 
during 400 generations. 

6.1 Results 

We have performed experiments for N=8, 10 and 16, for K between and N/2 
and for b between 2 and 5. We note that the case 6=1 is not relevant because the 
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optimum is always found at the first generations for all values of K . In Figure [H 
we have reported the success rate (over 35 x 10 runs) as a function of K for N=%. 
As expected, we see that for K=Q, the problem is easy to solve for all values 
of h. Moreover, increasing K decreases the success rate and this phenomenon 
is amplified when high values of h are used. For A^^IO and 16, too few runs 
find the optimum and so the variations of the success rate are not significant. 
The Figure [7] gives the evolution of the average number of blocks of the best 
individual found for N=l{), 6=4 and K between and 5. At the beginning of 
the runs, the number of blocks found increases quickly then halts after several 
generations. The higher is K, the sooner ends evolution. This behavior looks 
like premature converge and confirms experimentally that the number of local 
optima increases with K . We have also plotted the average number of blocks of 
the best individual found as a function of K for iV=16 (see Fig. H]). We see that 
this number decreases as K or b increases. These two parameters undoubtedly 
modify the performances and can be used independently to increase problem 
difficulty. 

In |15j , random and adaptive walks have been used to measure problem diffi- 
culty in GP. The author has shown that only the adaptive walk gives significant 
results on classical GP benchmarks. We have computed the correlation between 
these two measures and the average number of blocks found on ER, for all set- 
tings of iV, K and h. We note that the correlation is 0.71 between the length 
of the adaptive walk and the number of blocks. Conversely, the length of the 
random walk seems to be completely uncorrelated to performance. 



ER-Landscape N=10 







K=0 






K=1 


10 




K=2 — 




K=3 






K=4 


8 




K=5 


6 






4 
















2 








f 




n 







50 100 150 200 250 300 350 400 

Generation 

Fig. 7. Evolution of average number of blocks found on ER A^=10 and 6=4. 
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Fig. 8. Success rate as a function of 

K on ER 7V=8. 



Fig. 9. Average number of blocks 
found as a function of K on ER 
N=16. 



Conclusion 

We think tiiat a better understanding of the implications of variable length 
representations on Evolutionary Algorithms would allow researchers to use these 
structures more efficiently. In this paper, our goal is to investigate which kind 
of property could influence the difficulty of such problems. We have chosen two 
features of search spaces, the neutrality and the ruggedness. So, we have designed 
a family of problems, the Epistatic Road landscapes, where those features can 
be tuned independently. 

Statistical measures computed on ER-landscapes have shown that, similarly 
to NK-landscapes, tuning the epistatic coupling parameter K increases rugged- 
ness. Moreover, as for Royal Roads functions, tuning the size block parameter b 
increases neutrality. 

The experiments that we have performed with a VLR evolutionary algo- 
rithm, have demonstrated the expected difficulty according to parameters b and 
K. Although our results can not be directly transposed to real world problems, 
mainly because our initial hypotheses are too simple, in particular about the 
nature of building blocks, we have a ready-to-use VLR problem of tunable diffi- 
culty, which allows us to study the effects of genetic operators and the dynamics 
of the evolutionary process. 
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